Back

BMC Research Notes

Springer Science and Business Media LLC

Preprints posted in the last 90 days, ranked by how well they match BMC Research Notes's content profile, based on 29 papers previously published here. The average preprint has a 0.02% match score for this journal, so anything above that is already an above-average fit.

1
Consanguinity, Inbreeding Coefficient, Infant Mortality and congenital anomalies evaluation in the population of Faisalabad

Khalid, S.; Hassan, M.

2026-02-03 epidemiology 10.64898/2026.02.01.26345314 medRxiv
Top 0.1%
3.8%
Show abstract

BackgroundConsanguineous unions are defined as the matrimony between individuals who are blood relatives. Researchers in all over the world worked on this issue and they checked the ratio of prevalence and effects of consanguinity in different regions of world. This research was conducted in the District Faisalabad, upper Punjab. ObjectiveTo find rate of consanguinity, coefficient of inbreeding (F) and its impacts. MethodsThe data was collected from six tehsils of District Faisalabad by interviewing the subjects. The data collected within the time span of six months. Total of 2366 subjects were interviewed after their consent approval. ResultsThe rate of consanguinity was noted 41.83% with 0.03053 coefficient of inbreeding. High rate of consanguinity (23.36%) was noted among first cousins. The distantly related and not related unions were 35.64% and 22.56% respectively. The rate of consanguineous unions in six tehsils ranged from 33.99% in Jaranwala to 53.85% in Tandlianwala. Consanguineous marriages were noted high in Punjabi speaking subjects, in housewives, in reciprocal marital types, in grand-parents and one couple family types and Rajpoot castes. There was found no significant differences of consanguinity in rural and urban areas. The rate of still births was noted high (82.25%) in consanguineous unions while neonatal, post neonatal and child mortality was low such less as 6.45%, 8.06% and 3.22% respectively. The prenatal mortality was noted slightly high 44.94% in consanguineous unions as compared to non-consanguineous unions. The congenital malformation rate was 6.29% in all marital unions but this rate was high (59.06%) in consanguineous unions as compared to non-consanguineous unions (40.93%). This is a pilot study to analyze the potential of inbreeding coefficient (F) in the District Faisalabad.

2
Estimating the new event-free survival

Vilsmeier, J.; Saadati, M.; Miah, K.; Benner, A.; Doehner, H.; Beyersmann, J.

2026-03-26 oncology 10.64898/2026.03.25.26349169 medRxiv
Top 0.1%
3.7%
Show abstract

BackgroundIn acute myeloid leukemia studies, event-free survival (EFS) is defined as time until treatment failure, relapse, or death, whichever occurs first. Since 2020 and 2022, respectively, the US Food and Drug Administration and the European LeukemiaNet recommend analysing treatment failures as day-1 events. This data modification can lead to a potentially large drop in the estimated EFS at day 1. If censoring occurs, the Kaplan-Meier estimator obtained from the recoded data underestimates this drop. Our aim is to obtain an unbiased estimate for EFS as basis for further inference. MethodsWe define "event on day 1" as one event type and " event after day 1" as a competing event in the original data and use the Aalen-Johansen estimator of the cumulative incidence curve to estimate event-specific transition probabilities, which are combined in one EFS estimate. To analyse effects on day 1 treatment failure and other post-day-1 EFS events separately, a formal link to cure models is established by equating treatment failures with the "cured" proportion in cure model terminology. Additionally, a variance estimator, confidence intervals, confidence bands, and simultaneous testing procedures are derived. ResultsOur new estimation method differs from the Kaplan-Meier estimator in settings in which some treatment failures are censored, as in the interim analysis of the AMLSG 09-09 study. If almost no treatment failures are censored, the two estimation methods do not differ. The cure model and simultaneous testing are able to estimate effects on day 1 treatment failure and other post-day-1 EFS events separately and function independently of whether data is modified. ConclusionsThe Kaplan-Meier estimator evaluated on the recoded data underestimates the drop at day 1 if treatment failures are censored. With sufficient follow-up, this bias disappears, and results coincide with our novel approach.

3
A supervised digital game intervention supports language and communication in young children.

Pena, M.; Dehaene-Lambertz, G.; Pino, E.; Pittaluga, E.; Cortes, P.; de la Riva, C.; Palacios, O.; Guevara, P.

2026-04-04 developmental biology 10.64898/2026.04.02.716239 medRxiv
Top 0.1%
2.1%
Show abstract

The role of digital media in early childhood development remains highly debated, particularly regarding its impact on language acquisition. While excessive or unsupervised screen exposure has been linked to poorer outcomes, less is known about whether structured and interactive uses of technology can support learning. Building on previous research, we evaluated a brief, educator-supervised tablet-based intervention in 246 children aged 2-5 years from low- to middle-socioeconomic backgrounds attending public early education centers. Using a pre-post design with matched study and control groups, children completed 4-8 short training sessions (15 minutes each) involving interactive word-image associations spanning multiple linguistic categories. Preschoolers additionally engaged in prompted vocalization. Across age groups (2-3, 3-4, and 4-5 years), children in the intervention showed greater gains in language comprehension than controls, including receptive language in toddlers ({beta} = 0.49, p = 0.009), vocabulary and morphology in younger preschoolers ({beta} = 0.59-0.68, all p < 0.05), and grammar comprehension in older preschoolers ({beta} = 0.30, p = 0.038). These effects were consistent after accounting for child and parental characteristics. Together, these findings suggest that the developmental impact of digital media depends less on exposure itself than on how it is used. When embedded in structured, socially guided interactions, even brief tablet-based activities may support early language development

4
Community action for newborn care and survival through participatory women's groups and health workers in rural Bangladesh: a before-and-after implementation study of scale-up.

Fottrell, E.; Akter, K.; Kuddus, A.; Kumar Shaha, S.; Nahar, B.; Azam, G.; Nahar, T.; Costello, A.; Azad, K.

2026-01-30 public and global health 10.64898/2026.01.28.26344942 medRxiv
Top 0.1%
1.8%
Show abstract

BackgroundCommunity mobilisation through participatory womens groups (PWGs) has been shown to be an effective intervention to improve maternal and neonatal survival in low-income settings, including Bangladesh. Despite WHO recommendations and scale-up in some contexts, the intervention has not been widely scaled-up in Bangladesh. To add to the existing evidence-base for PWGs and to renew calls for effective, scalable interventions to improve neonatal outcomes in the post-Sustainable Development Goals era, we report the design, implementation and evaluation of a volunteer-led model for PWGs delivered in rural Bangladesh in 2014/15. MethodsWorking in three rural unions in Faridpur, Bangladesh, we applied a volunteer-led, lower coverage and shorter duration PWG intervention. Mixed methods evaluation monitored key indicators of intervention delivery, uptake and receipt. Prospective quantitative surveys gathered data on birth outcomes, health care utilisation and essential newborn care practices. Data from before and after the implementation period were compared and interpreted in relation to historical trends in the study area and other rural areas of Bangladesh. Results180 participatory womens groups facilitated by 45 volunteer facilitators over a period of 15 months were successfully implemented giving a population coverage of one group per 500 population. An average of 32 (min.=18, max.=64) participants attended each PWG meeting, 42% of participants attended meetings on a monthly basis and 11% reported that they actively shared information from the PWGs with non-attenders. 30% of women of reproductive age and 54% of pregnant women participated in the. Focus group discussions with participants and community members revealed positive attitudes towards the groups. A change in trend in extended perinatal mortality rates was observed during the intervention period, corresponding temporally with indicators of improved rates of service utilisation and essential newborn care practices relative to the pre-implementation period. ConclusionThe modified PWG intervention likely contributed to positive changes in delivery and neonatal care practices similar to previous studies in Bangladesh. The PWG model remains an important approach to community empowerment that could contribute to enhanced efforts to end preventable neonatal deaths as we move towards the end of the Sustainable Development Goal era and beyond.

5
Interdependent Patient-Reported Outcome Patterns During Breast Cancer Pharmacotherapy: A Correlation-Based Analysis Using EORTC QLQ-C30 and QLQ-BR23

Sutanto, H.; Savitri, M.; Hendarsih, E.; Ashariati, A.

2026-02-11 oncology 10.64898/2026.02.10.26345961 medRxiv
Top 0.1%
1.8%
Show abstract

BackgroundQuality-of-life (QoL) assessment is essential in breast cancer care, yet limited evidence describes how interrelated QoL domains change during pharmacotherapy. This study aimed to evaluate correlations among functional and symptom scales using the EORTC QLQ-C30 and QLQ-BR23, highlighting their ability to reveal multidimensional QoL patterns. MethodsA prospective observational study was conducted in two second-referral hospitals in Indonesia, enrolling 106 female breast cancer patients. QoL was assessed before and after pharmacotherapy using QLQ-C30 and QLQ-BR23. Changes in scores ({Delta}) were computed, and interdomain relationships were analyzed using Spearmans rho. ResultsPhysical functioning correlated with role functioning ({rho} = 0.55, p <0.001), emotial functioning ({rho} = 0.33, p <0.001), and social functioning ({rho} = 0.31, p = 0.002). Role and social functioning were likewise correlated ({rho} = 0.32, p = 0.001), indicating that improvements across functional domains tended to occur in parallel. Symptom scales showed strong positive clustering, including fatigue with pain ({rho} = 0.37, p <0.001), insomnia ({rho} = 0.35, p <0.001), and systemic side effects ({rho} = 0.48, p <0.001). Functional and symptom domains generally exhibited inverse relationships: physical functioning negatively correlated with fatigue ({rho} = -0.40), pain ({rho} = -0.43), both p <0.001, and systemic side effects ({rho} = -0.26; p = 0.01). ConclusionThe QLQ-C30 and QLQ-BR23 instruments effectively captured structured, clinically meaningful interdependencies. Functional improvements consistently aligned with symptom reductions, revealing coherent functional-symptom clustering. These findings underscore the sensitivity of QoL instruments to detect multidimensional patient-reported changes during breast cancer pharmacotherapy.

6
Image Analysis Tools for Electron Microscopy

Shtengel, D.; Shtengel, G.; Xu, C. S.; Hess, H. F.

2026-03-14 bioinformatics 10.64898/2026.03.11.711125 medRxiv
Top 0.1%
1.7%
Show abstract

Electron Microscopy (EM) is widely used in many scientific fields, particularly in life sciences, offering high-resolution information on the ultrastructure of biological organisms. Accurate characterization of EM image quality is important for assessing the EM tool performance, in addition to sample preparation protocol, imaging conditions, etc. This paper provides an overview of tools we developed as plugins for the popular image processing package Fiji (ImageJ) (1). These tools include signal-to-noise ratio analysis, contrast evaluation, and resolution analysis, as well as the capability to import images acquired on custom FIB-SEM instruments (2). We have also made these tools available in Python, with both versions available on GitHub.

7
The Health Interventions Impact Calculator (HIIC): scaling up web-based access to proportional multistate lifetable analyses of avoidable burden, health gain and economic impacts.

Khuu, S.; Wilson, T.; Dhungel, B.; Howe, S.; Blakely, T.

2026-02-03 epidemiology 10.64898/2026.02.01.26345324 medRxiv
Top 0.1%
1.7%
Show abstract

Health metrics and modelling capacity have expanded to address present burden and burden attributable to risk factors in the past. There remains a gap in accessible tools that estimate avoidable burden; --that is, the future health and economic impacts of preventive and treatment interventions. This paper describes and demonstrates the Health Interventions Impact Calculator (HIIC), a free web-based analysis and visualisation tool that allows for rapid estimation of the future health and economic impacts of user-specified intervention scenarios for multiple diseases and risk factors. HIIC draws on precomputed outputs from the Scalable Health Intervention Evaluation programme (SHINE) proportional multistate lifetable (PMSLT) models. Users define an intervention scenario by specifying intervention timing, target population, intervention cost, and then modifying either disease rates or risk factor exposures. HIIC currently reports outcomes as differences between a business-as-usual (BAU) baseline and the intervention scenario, including health-adjusted life years (HALYs), deaths averted, health system expenditure and income impacts, for Australia. Outputs are presented through interactive dashboard visualisations and downloadable results. Three example intervention scenarios for Australia are presented: a 10% reduction in ischemic heart disease incidence, a 10% reduction in cervical cancer case fatality rate, and a BMI reduction of 2.5 kg/m{superscript 2} toward the theoretical minimum risk exposure level (TMREL). Across examples, HIIC generates 10-, 20-, and 40-year projections for health gains, mortality displacement over time, and economic impacts. A comparison of interventions based on cost-effectiveness shows how incremental costs and HALYs gained relative to BAU can differ substantially across intervention types, reflecting both intervention design and the level and trajectory of baseline burden. HIIC is a world-first accessible framework for standardised comparison of intervention scenarios against BAU that will soon be available for all countries. By linking risk factor and disease trajectory changes to health and economic outcomes within a consistent modelling structure, HIIC can inform transparent and reproducible priority setting for decision makers and researchers alike. Author SummaryDecision-makers need to determine which health interventions offer the greatest health gain for the resources invested, but comparing different options has been difficult. While we can measure current disease burden and past impacts, accessible tools for estimating what could be avoided in the future through new interventions have been lacking. We created a free online tool called the Health Interventions Impact Calculator that allows users to explore future scenarios for health interventions. Users enter details about a proposed intervention, such as reducing obesity, improving disease screening, or enhancing treatment, and the tool estimates future outcomes: deaths averted; health improvements, and costs over the next 10, 20, or 40 years. The tool compares each scenario against a business-as-usual future to show what additional benefits the intervention might achieve. We demonstrated this using three examples in Australia: a body-mass index intervention, a heart disease prevention intervention, and improvements to cervical cancer treatment. Currently available for Australia with over 200,000 ready-to-use scenarios and expanding worldwide in 2026, this tool provides researchers, health organizations, and policymakers with a standardised way to rapidly estimate and compare the future impact of different health interventions, supporting evidence-based decisions about where to invest limited resources.

8
Estimating mean growth trajectories when measurements are sparse and age is uncertain

Bunce, J. A.; Revilla-Minaya, C.; Fernandez, C. I.

2026-02-26 developmental biology 10.64898/2026.02.24.707738 medRxiv
Top 0.1%
1.7%
Show abstract

Background and objectivesComparing childrens growth across the world and at different moments in history can yield insight into both health challenges and healthy morphological variation in our species. A difficulty of such comparative analyses is that, in marginalized populations, there are often logistical complications to obtaining repeat measures of individual childrens height and weight. The problem is even more acute for historical populations: bioarchaeological datasets comprise single measures of individuals at death. Additionally, for both contemporary and historical populations, there is often non-trivial uncertainty about childrens ages. Both of these factors complicate estimation of growth trajectories. Here we evaluate the degree to which we can accurately estimate a population-mean growth trajectory using only a small number of (randomly) uncertain measurements, like those that compose many contemporary and bioarchaeological datasets. MethodologyWe recently derived a causal model of human growth from fundamental principles of metabolism and allometry, permitting exploration of genetic and environmental contributions to childrens growth. Here, we fit this model in a Bayesian framework to simulated cross-sectional and longitudinal datasets of varying size, where age is uncertain. ResultsWe show that, for large-scale comparative purposes, reasonably accurate population-mean growth trajectories may be obtained from single height measures of 100 children. However, detailed analyses of pubertal growth spurts and the metabolic and allometric parameters underlying growth require more extensive longitudinal datasets. Conclusions and implicationsWe conclude that this new model and estimation strategy constitute a potentially useful toolkit for comparing mean growth trajectories across contemporary and historical populations.

9
Development and Pilot Validation of ABHA-O-SHINE: An AI-Ready Oral Health Risk and Insurance Prediction Framework within the Ayushman Bharat Digital Ecosystem

Saxena, Y.; SHRIVASTAVA, L.

2026-04-01 public and global health 10.64898/2026.03.31.26349846 medRxiv
Top 0.1%
1.5%
Show abstract

Background: Oral health remains inadequately integrated within the Ayushman Bharat Digital Mission (ABDM), particularly in terms of structured risk assessment and its linkage to insurance-based decision-making. There is a growing need for scalable models that can connect clinical oral health data with digital health systems and support future artificial intelligence (AI)-driven applications. Aim: To develop and pilot test the ABHA-O-SHINE framework for oral health risk prediction and insurance prioritization, with a future scope for AI integration within the Ayushman Bharat Health Account (ABHA) ecosystem. Materials and Methods: A cross-sectional pilot study was conducted among 126 participants attending the outpatient department of Swargiya Dadasaheb Kalmegh Smruti Dental College and Hospital, Nagpur. Participants were selected based on predefined inclusion and exclusion criteria. Data collection included a structured questionnaire and clinical examination using the WHO Oral Health Assessment Form (2013). A composite risk score (0 to 14) was developed incorporating behavioral and clinical parameters. Participants were categorized into low, moderate, and high-risk groups, and corresponding insurance priority levels were assigned. Statistical analysis included descriptive statistics, Chi-square test, Spearman correlation, and binary logistic regression. Results: The majority of participants were categorized under moderate to high-risk groups. Tobacco use showed a statistically significant association with higher risk levels (p less than 0.05). Positive correlations were observed between total risk score and clinical indicators such as DMFT and CPI. Logistic regression analysis identified tobacco use and clinical scores as significant predictors of high-risk categorization. Conclusion: The ABHA-O-SHINE framework demonstrates feasibility in integrating oral health risk assessment with an insurance prioritization model. The framework is designed to be AI-compatible, enabling future automation through machine learning and image-based analysis within the ABDM ecosystem. Keywords: ABHA, ABDM, Oral Health, Risk Assessment, Insurance, Artificial Intelligence.

10
A Web Application for Exploring Distribution in Academic Publications Across Geography and Institutions in India

Hou, Y.; Cohen, E.; Higginbottom, J.; Rountree, L.; Ren, Y.; Wahl, B.; Nyhan, K.; Mukherjee, B.

2026-03-20 health informatics 10.64898/2026.03.18.26348755 medRxiv
Top 0.2%
1.5%
Show abstract

India's national research capacity and infrastructure are unevenly distributed across states and union territories (UTs), contributing to geographic variation in academic publication output. We developed Indiapub, an open-access web application that quantitatively enumerates and visually displays geographic and temporal publication patterns for research products with at least one author affiliated with an Indian institution, using OpenAlex data. The app is designed for ease of use, with automated data retrieval, cleaning, and aggregation. Indiapub allows users to filter publications by topic, publication year range, author position, publication type, minimum citation count, state/UT, and population size of the state/UT where the author institution is located. The app also provides downloadable tables and ranked institution lists by publication count. Its interactive dashboard includes five modules: (i) a map of publication distribution, (ii) time trend plots for nation and state/UT, (iii) publication-share versus population-share plots highlighting over- and underrepresentation, (iv) stacked bar charts of state/UT contributions over time with population benchmarks, and (v) bubble plots relating the Human Development Index to publication volume over time. This tool may support resource prioritization and identification of institutional strengths for trainees, researchers, higher education administrators, and policymakers. To illustrate its utility, we present sample findings derived from the app. For publications across all topics from 2014 to 2025, the largest research participation footprints were observed in Tamil Nadu, Maharashtra, Delhi, Uttar Pradesh, and Karnataka. Tamil Nadu and Delhi were home to three of the highest-publishing institutions nationally: Vellore Institute of Technology, All India Institute of Medical Sciences, and Indian Institute of Technology Delhi. We also examined six curated case studies of broad scientific interest: electronic health records (EHR), genome-wide association studies (GWAS), artificial intelligence (AI), development economics, environmental science, and COVID-19. Findings from these case studies revealed over- and underrepresentation in publication output across states and UTs. For example, in EHR publications among high-population states, Tamil Nadu's publication share exceeded its population share by 31.3 percentage points (pp), whereas Bihar's was 12.8 pp lower. Our tool offers insights into India's research landscape across states and UTs with easy-to-digest visuals. Such interactive tools have the potential to serve as a starting point for fostering a more inclusive research ecosystem supporting targeted research policy and planning.

11
Development and assessment of tailored illustrations to enhance community understandings of genetics topics

Arner, A. M.; McCabe, T. C.; Seyler, A.; Zamri, S. N.; A/P Tan Boon Huat, T. B. T.; Tam, K. L.; Kinyua, P.; John, E.; Ngoci Njeru, S.; Lim, Y. A.; Gurven, M.; Nicholas, C.; Ayroles, J.; Venkataraman, V. v.; Kraft, T. S.; Wallace, I. J.; Lea, A. J.

2026-03-19 scientific communication and education 10.64898/2026.03.17.711941 medRxiv
Top 0.2%
1.4%
Show abstract

ObjectivesEffective communication about genetics concepts is essential for collaborative anthropological genetics research. However, communication can be challenging because many ideas are abstract and may be especially unfamiliar to communities with limited access to formal education. Indeed, there are no widely adopted models for communicating such information, nor a clear understanding of the social factors that may shape participant engagement. Here, we conducted a qualitative and quantitative, community-driven study to understand how illustrations can be useful to support concept sharing with two Indigenous groups--the Orang Asli of Peninsular Malaysia and the Turkana of Kenya. MethodsWe used a two phase approach to create and evaluate how illustrations can bolster communication about genetics concepts. First, we created images illustrating answers to frequently asked questions about genetics, iteratively updating the illustrations based on participant feedback. Second, we conducted 92 interviews to evaluate the finalized illustrations effectiveness. Finally, we analyzed the interview data using thematic analyses, multivariable modeling, and multiple correspondence analyses to identify patterns in participant understanding and feedback, including age, sex, market integration, and schooling. ResultsParticipants reported high interest in genetics research (92%) and broadly positive perceptions of the illustrations. Familiar, locally-grounded imagery was preferred and associated with greater perceived clarity, while more technical illustrations were more frequently reported as confusing. Quantitative analyses showed strong internal consistency across measures of engagement and understanding, with modest variation by degree of market-integration, schooling, and sex. DiscussionOur findings demonstrate that community-specific visualizations, co-developed through iterative feedback, can effectively support engagement with genetics research in participant communities.

12
Discordance in pleural mesothelioma response classification and modelling of impact on clinical trials

Cowell, G. W.; Roche, J.; Noble, C.; Stobo, D. B.; Papanastasiou, A.; Kidd, A. C.; Tsim, S.; Blyth, K. G.

2026-03-20 oncology 10.64898/2026.03.18.26348731 medRxiv
Top 0.2%
1.3%
Show abstract

Introduction Agreement between radiologists regarding treatment response in Pleural Mesothelioma (PM) is acknowledged to be poor, but downstream effects in clinical trials have not been quantified. Methods We performed a mixed methods study, composed of a multicentre, retrospective cohort study and in silico modelling. CT images and data were retrieved from 4 UK centres regarding chemotherapy-treated patients. Expert radiologists classified response using modified Response Evaluation Criteria In Solid Tumours criteria (mRECIST) v1.1, generating discordance rate (%) and agreement. In silico modelling simulated two-arm trials of an active therapy with intended 80% power and confidence intervals for four endpoints (objective response rate (ORR), disease control rate (DCR), progression-free survival (PFS), overall survival (OS)) covering 95% of the true effect. Actual power and endpoint coverage were modelled against mRECIST misclassification rate (a single reporter equivalent of discordance rate). Consecutive simulations varied misclassification rate from 0-100% in 1% increments, each repeated 10,000 times. Results 172 cases were included. Discordance rate was 35% (60/172), kappa=0.456. In silico modelling demonstrated reduced power and endpoint precision with increasing misclassification. At 17% misclassification, corresponding to the observed 35% discordance, power dropped from 80% to 55% for ORR, 53% for DCR, 65% for PFS and 66% for OS, with endpoint coverage reduced to 88%, 89%, 92% and 92%, respectively. 50/60 (83%) discordances reflected interpretation or measurement differences intrinsic to mRECIST. Discordance was not associated with tumour volume. Conclusions Inconsistent response classification is common in PM and substantially reduces statistical power and endpoint precision in clinical trials.

13
Predicting Depressive Symptoms Among Reproductive-Aged Women in Bangladesh Using Bagging Ensemble Machine Learning on Imbalanced Bangladesh Demographic and Health Survey 2022 Data

Mahmud, S.; Akter, M. S.; Ahamed, B.; Rahman, A. E.; El Arifeen, S.; Hossain, A. T.

2026-04-23 public and global health 10.64898/2026.04.22.26351445 medRxiv
Top 0.2%
1.3%
Show abstract

Background Depressive symptoms among reproductive-aged women represent a major public health concern in low- and middle-income countries, yet systematic screening remains limited. In most population survey datasets, the low prevalence of depression results in severe class imbalance, which challenges conventional machine learning models. Therefore, we develop and evaluate a bagging-based ensemble machine learning framework to predict depressive symptoms among reproductive-aged women using highly imbalanced Bangladesh demographic and health survey (BDHS) 2022 data. Methods The sample comprised women aged 15-49 years drawn from BDHS 2022 data. Depressive symptoms were defined using the Patient Health Questionnaire (PHQ-9 [&ge;]10). Candidate predictors were drawn from sociodemographic, reproductive, nutritional, psychosocial, healthcare access, and environmental domains. Feature selection was performed using Elastic Net (EN), Random Forest (RF), and XGBoost model. Five classifiers (EN, RF, Support Vector Machine (SVM), K-nearest neighbors (KNN), and Gradient Boosting Machine (GBM)) were trained using both oversampling-based approaches and the proposed ensemble framework. Model performance was evaluated on an independent test set using accuracy, sensitivity, specificity, F1-score, and the normalized Matthews correlation coefficient (normMCC). Results Approximately 4.8% of women were identified with depressive symptoms. The proposed bagging ensemble framework consistently achieved more balanced predictive performance than oversampling-based models. Average normMCC improved from 0.540 (oversampling) to 0.557 (ensemble). RF and GBM ensembles demonstrated notable improvements in identifying depressive cases, while the EN ensemble achieved the highest overall performance and sensitivity. Threshold optimization yielded stable normMCC across models, indicating robust trade-offs between sensitivity and specificity. Conclusions Bagging-based ensemble learning provides a more robust and balanced approach than synthetic oversampling for predicting depressive symptoms in highly imbalanced population survey data. This approach has important implications for improving early identification and population-level mental health surveillance in resource-constrained settings.

14
A Cross-Sectional Survey to Estimate the Prevalence of Family History of Colorectal, Breast, and Ovarian Cancer in Derna City, Libya

Alghazali, M. A.; AbdulKareem, E. A.; ElShaihani, A. R.; ElGabaili, R. F.; Erkhais, J. A.

2026-02-02 public and global health 10.64898/2026.01.27.26343764 medRxiv
Top 0.2%
1.3%
Show abstract

BackgroundFamily history of cancer is a well-established risk factor for several malignancies, including colorectal, breast, and ovarian cancers. Estimating the prevalence of familial cancer history is essential for identifying high-risk populations and guiding targeted prevention strategies. ObjectiveThis study aimed to estimate the prevalence of family history of colorectal, breast, and ovarian cancer among residents of Derna City, Libya. MethodsA cross-sectional survey was conducted among 300 participants aged 17-45 years, selected using stratified random sampling. Data were collected through structured questionnaires covering sociodemographic characteristics and family history of cancer. Descriptive statistical analyses were performed to estimate prevalence rates. ResultsThe mean age of participants was 24.65 {+/-} 4.70 years, with the majority under 25 years of age (67.3%). Females constituted 79.9% of the sample, and most participants had a university-level education (93.5%). A family history of breast cancer was reported by 30.0% of participants, followed by colorectal cancer (23.3%) and ovarian cancer (13.3%). These findings indicate a substantial proportion of individuals with potential genetic susceptibility to these cancers within the study population. ConclusionA notable prevalence of family history of colorectal, breast, and ovarian cancers was observed in Derna City. These results underscore the importance of incorporating family history assessment into routine healthcare practice and strengthening genetic counseling, screening, and public awareness programs. Targeted prevention strategies may help reduce the burden of hereditary cancers in this region.

15
An intuitive sampling framework for setting-specific decision-making in soil-transmitted helminthiasis control programs

Kazienga, A.; Levecke, B.; de Vlas, S. J.; Coffeng, L. E.

2026-02-14 epidemiology 10.64898/2026.02.11.26346062 medRxiv
Top 0.2%
1.3%
Show abstract

BackgroundWe recently developed a general egg count framework to support cost-efficient survey design choices to inform soil-transmitted helminthiasis (STH) control programs. Yet, the interpretation and the application was not always intuitive for program managers. MethodsWe first adapted the existing framework to make the interpretation of risks of incorrect decision making more intuitive and to allow for prior information. Then, we assessed the impact of the allowable risk of incorrect decision-making and prior information on the required sample size. Finally, we determined the most cost-efficient survey design to inform the decisions (i) to switch to an event-based deworming program, and (ii) to declare STH eliminated as a public health problem (EPHP). Principal findingsThe required sample sizes increased when the allowable risk of incorrect decision reduced and when the mean prior approached the program prevalence threshold. For the decisions to switch to event-based deworming and to declare EPHP, we found that duplicate Kato-Katz thick smears on a single stool sample was the most cost-efficient survey design, particularly when particularly when accounting for the added benefits of the free internal quality control. The required sample size for these survey designs varied between program targets and STH species. When aiming to have one sample size that fits all STHs, we recommend sampling 6 schools and 56 children per school for decisions on switching to event-based control programs and 11 schools (74 children per school) for the decision to declare EPHP. Conclusions/significanceWe developed an intuitive sampling framework for setting-specific decision-making in STH control programs. We identified the most cost-efficient survey designs for critical program decisions, but these are based on subjective but reasonable choices regarding the risk of incorrect decision making. Reaching consensus within the STH community on acceptable levels of risk is crucial to further support evidence-based decision-making. Author summaryWe recently developed a general computer simulation framework to support cost-efficient survey design choices for the control of intestinal worms. However, its interpretation was not always intuitive and it did not allow incorporation of prior knowledge on the prevalence of infections that programs might have. In this study, we adapted our framework to make the risks of incorrect decision-making more intuitive to interpret and to incorporate prior information on worm prevalence. We then quantified how different risk tolerances and prior prevalence assumptions affected required survey designs. Using this framework, we then identified the most cost-efficient survey designs for two key program decisions: switching to event-based deworming and declaring elimination of intestinal worms as a public health problem. We found that lower tolerance for incorrect decisions and greater uncertainty around prior prevalence substantially increase required sample sizes. Across the different program decisions and worm species, examining duplicate Kato-Katz thick smears from a single stool sample was consistently the most cost-efficient design, with the added benefit of internal quality control. Our results provide practical guidance for designing surveys tailored to local settings and highlight the importance of reaching consensus on acceptable levels of decision-making risk to support evidence-based STH control.

16
Sample size in social contact surveys for epidemic modelling

Danon, L.; Brooks-Pollock, E.

2026-03-31 epidemiology 10.64898/2026.03.30.26349407 medRxiv
Top 0.3%
1.2%
Show abstract

Background Social contact surveys, which measure who-contacts-whom, are widely used to inform infectious disease transmission models and estimate the reproduction number (R), a key metric for assessing epidemic risk. Despite their widespread use, sample size calculations are not routinely performed. Aims To assess the impact of sample size on estimates of R and determine a practical target sample size for social contact surveys used in epidemic modelling. Methods We conducted a review of social contact surveys (2008-2025) to characterise current practice. We characterised the impact of survey size on epidemic metrics using two social contact surveys, the UK Social Contact Survey and POLYMOD (Europe) and two methods. For each dataset and approach, we generated repeated subsamples and calculated the resulting reproduction numbers, characterised their distributions and measured uncertainty. Results We identified 107 unique social contact surveys from 57 studies. Sample sizes ranged from 30 to more than 10,000 participants, with a median of 1,438. One quarter of surveys contained fewer than 1,000 participants. From our simulations, we find that sample sizes below 200 individuals can result in highly variability reproduction numbers. Increasing sample size increases precision, and the most meaningful gains are up to 1,300 individuals. Increasing sample sizes over 3,000 individuals leads to smaller gains. Conclusions A minimum sample size of approximately 1,200-1,300 participants appears sufficient for general-purpose use. These findings support the inclusion of sample size considerations in the design, reporting and interpretation of social contact surveys used for epidemic intelligence and public health decision-making.

17
Prevalence of Non-communicable diseases among the pregnant women in selected three teagardens of Sreemongol Upazila in Moulvibazar district

Abdullah, A. S. M.; Haq, F.; Dalal, K.

2026-03-26 epidemiology 10.64898/2026.03.22.26348744 medRxiv
Top 0.3%
1.2%
Show abstract

Bangladesh is experiencing emerging burden of Non-Communicable Diseases (NCDs). Non-communicable diseases (NCDs) are the emerging as major cause of morbidity and mortality, accounting for 61% of deaths in Bangladesh. The study aims to describe the prevalence of NCDs among pregnant women in teagardens in Moulvibazar district. Three teagardens of Sreemongol upazila in Moulvibazar district was selected randomly. The pregnant women were considered for collecting the NCD related information. A sample size of 86 was purposively selected based on relevant literature review. Data was collected by conducting face to face interview with the respondents through pre-tested semi-structured questionnaire. Data was analyzed with the help of SPSS Version 24 Software. For effective use of limited resources, an increased understanding of the shifting burden and better characterization of risk factors of NCDs including Hypertension is needed. Average age of the women attended for screening test was 23 (15-45) years. More than 47% women were found with Gravida 1. The mean duration of pregnancy was found 18.8 weeks. Above 24% percent of GDM women were found at low blood pressure but 2% were identified at high blood pressure. 28% were found underweight with BMI calculation but 11% were identified with overweight. The challenges tests for blood sugar findings of women were found 12.7% GDM positive (7.8-<11 mmol/L). About 16.5% had complications during pregnancy including anaemia, eclampsia, edema, diarrhoea etc. A community based NCDs surveillance model could be developed through participation Government health managers, experts and stakeholders, which were taken by local health system for implementation.

18
Transportability of missing data models across study sites for research synthesis

Thiesmeier, R.; Madley-Dowd, P.; Ahlqvist, V.; Orsini, N.

2026-03-10 epidemiology 10.64898/2026.03.09.26347913 medRxiv
Top 0.3%
1.1%
Show abstract

IntroductionSystematically missing covariates are a common challenge in medical research synthesis of quantitative data, particularly when individual participant data cannot be shared across study sites. Imputing covariate values in studies where they are systematically unobserved using information from sites where the covariate is observed implicitly assumes similarity of associations across studies. The behaviour of this assumption, and the bias arising from violating it, remains difficult to qualitatively reason about. Here, we evaluated a two-stage imputation approach for handling systematically missing covariates using simulations across a range of statistical and causal heterogeneity scenarios. MethodsWe conducted a simulation study with varying degrees of between-study heterogeneity and systematic differences in model parameters. A binary confounder was set to systematically missing in half of the studies. Study-specific effect estimates were combined using a two-stage meta-analytic model. The performance of the imputation approach was evaluated with the primary estimand being the pooled conditional confounding-adjusted exposure effect across all studies. ResultsBias in the pooled adjusted effect estimate was small across scenarios with low to substantial between-study heterogeneity. Bias increased monotonically with increasingly pronounced differences in causal structures across study sites. Coverage remained close to the nominal level under low to substantial between-study heterogeneity, but deteriorated markedly as differences in causal structures between study sites became more severe. ConclusionThe two-stage cross-site imputation approach produced valid pooled effect estimates across a wide range of simulated scenarios but showed monotonic sensitivity to differences in causal structures across studies. The results provide insight into the conditions under which cross-site imputation may be appropriate for handling systematically missing covariates in research synthesis.

19
Corpus for Benchmarking Clinical Speech De-identification

Dai, H.-J.; Fang, L.-C.; Mir, T. H.; Chen, C.-T.; Feng, H.-H.; Lai, J.-R.; Hsu, H.-C.; Nandy, P.; Panchal, O.; Liao, W.-H.; Tien, Y.-Z.; Chen, P.-Z.; Lin, Y.-R.; Jonnagaddala, J.

2026-04-03 health informatics 10.64898/2026.03.31.26349906 medRxiv
Top 0.3%
1.1%
Show abstract

Objectives Publicly available datasets dedicated to clinical speech deidentification tasks remain scarce due to privacy constraints and the complexity of speech-level annotation. To address this gap, we compiled the SREDH-AICup sensitive health information (SHI) speech corpus, a time-aligned clinical speech dataset annotated across 38 SHI categories. Methods Two publicly available English medical-domain datasets were adapted to support speech-level de-identification, including script reformulation and controlled re-recorded by 25 participants. Additional Mandarin Chinese clinical-style materials were incorporated to extend linguistic coverage. All audio data were annotated with million-level, time-aligned SHI spans using Label Studio. Inter-annotator agreement was evaluated using Cohen's kappa, following iterative calibration rounds. The resulting corpus supports both automatic speech recognition (ASR) and speech-level recognition of SHIs. Results The final dataset comprises 20 hours of annotated audio, divided into training (10 hours, 1,539 files), validation (5 hours, 775 files), and test (5 hours, 710 files) subsets, totalling 7,830 SHI entities. The language distribution reflects the composition of the selected source materials, with 19.36 hours of English and 0.89 hours of Mandarin Chinese speech. Discussion The corpus exhibits a long-tail distribution consistent with clinical documentation patterns and highlights the limited availability of Chinese medical speech resources. These characteristics underscore both the realism of the dataset and structural challenges associated with multilingual speech de-identification. Conclusion The SREDH-AICup SHI speech corpus provides a clinically grounded, time-aligned speech dataset supporting automated medical speech de-identification research and facilitating future development of multilingual speech-based privacy protection systems.

20
Activities of aqueous extract of Tamarindus indica on phenylhydrazine-induced hematopathological changes in anemic male wistar rats

Dare, S. S.; Stephen, C. P.; Mario, E. F.

2026-02-02 systems biology 10.64898/2026.01.30.702729 medRxiv
Top 0.3%
1.0%
Show abstract

IntroductionDrug Induced Hemolytic Anemia (DIHA), following exposure to hematopathologically profound molecules, presents with variable clinical syndromes, misinterpreted serological results, misdiagnosis, challenging and controversial treatment; and no specific antihemolytic agent. Its treatment could be enhanced by use of natural molecules in some medicinal plants. Therefore, this study is aimed to determination the activities of aqueous extract of T. indica on PHZ-induced hematopathological changes in anemic male Wistar rats. Materials and Methods60mg/kg of Phenylhydrazine (PHZ) was administered for 2 days to induce hemolytic anemia intraperitoneally. 30 male Wistar rats were randomly divided into 5 groups, each with 6 rats. G1-untreated. Anemic rats were divided into G2- G5. G2-untreated, G3-treated with 1mL Ferro B syrup, G4 and G5 treated with 400mg/kg and 800mg/kg of T. indica pulp extract respectively. Test drug and extract were orally administered daily for 7 and 14 days respectively. Cases in G2 - G5 were sacrificed under light ether anesthesia on days 9 and 16 post-therapy, G1 at the end of the experimental period. Blood collected via cardiac puncture were subjected for Red Blood Cells (RBC) histopathology, serum Lactate Dehydrogenase (LDH), and reticulocyte counts. The femur was harvested for bone marrow Histopathology. ResultsPHZ induced hemolytic anemia marked by profound serum LDH elevation & reticulocytosis, marked RBC morphological distortions & bone marrow degenerative changes suggestive of marrow fibrosis & suppression. Marrow regeneration marked by hypercellularity & decreased adipocytes were evident of hematopoiesis induced by the 2 weeks test therapies; significant moderate populations of normal mature peripheral RBCs, serum LDH & reticulocyte % reduction were typical; consistent with significant recovery from the acute hemolytic episode. ConclusionT. indica fruit pulp extract effectively stimulated hematopoiesis in response to drug induced hemolytic effect on the hematopathologic parameters, with significant improvement from hemolytic anemia.